9-element Vector{String}:
"H"
"H"
"H"
"T"
"H"
"H"
"H"
"H"
"H"
Lecture 05
February 5, 2024
\[\underbrace{h_t}_{\substack{\text{hare} \\ \text{pelts}}} \sim \text{LogNormal}(\log(\underbrace{p_H}_{\substack{\text{trap} \\ \text{rate}}} H_T), \sigma_H)\] \[l_t \sim \text{LogNormal}(\log(p_L L_T), \sigma_L)\]
\[ \begin{align*} \frac{dH}{dt} &= H_t b_H - H_t (L_t m_H) \\ H_T &= H_1 + \int_1^T \frac{dH}{dt}dt \end{align*} \]
\[ \begin{align*} \frac{dL}{dt} &= L_t (H_t b_L) - L_t m_L \\ L_T &= L_1 + \int_1^T \frac{dL}{dt}dt \end{align*} \]
So far: no way to use prior information about parameters (other than bounds on MLE optimization).
For example: what “trap rates” are more plausible?
Original version (Bayes, 1763):
\[P(A | B) = \frac{P(B | A) \times P(A)}{P(B)} \quad \text{if} \quad P(B) \neq 0.\]
“Modern” version (Laplace, 1774):
\[\underbrace{{p(\theta | y)}}_{\text{posterior}} = \frac{\overbrace{p(y | \theta)}^{\text{likelihood}}}{\underbrace{p(y)}_\text{normalization}} \overbrace{p(\theta)}^\text{prior}\]
The version of Bayes’ rule which matters the most for 95% (approximate) of Bayesian statistics:
\[p(\theta | y) \propto p(y | \theta) \times p(\theta)\]
“The posterior is the prior times the likelihood…”
Bayesian credible intervals are straightforward to interpret: \(\theta\) is in \(I\) with probability \(\alpha\).
Choose \(I\) such that \[p(\theta \in I | \mathbf{y}) = \alpha.\]
A fully specified Bayesian model includes:
Think: Prior provides proposed explanations, likelihood re-weights based on ability to produce the data.
Bayesian models lend themselves towards generative simulation by generating new data \(\tilde{y}\) through the posterior predictive distribution:
\[p(\tilde{y} | \mathbf{y}) = \int_{\Theta} p(\tilde{y} | \theta) p(\theta | \mathbf{y}) d\theta\]
One perspective: Priors should reflect “actual knowledge” independent of the analysis (Jaynes, 2003)
Another: Priors are part of the probability model, and can be specified/changed accordingly based on predictive skill (Gelman et al., 2017; Gelman & Shalizi, 2013)
We would like to understand if a coin-flipping game is fair. We’ve observed the following sequence of flips:
The data-generating process here is straightforward: we can represent a coin flip with a heads-probability of \(\theta\) as a sample from a Bernoulli distribution,
\[y_i \sim \text{Bernoulli}(\theta).\]
Suppose that we spoke to a friend who knows something about coins, and she tells us that it is extremely difficult to make a passable weighted coin which comes up heads more than 75% of the time.
Since \(\theta\) is bounded between 0 and 1, we’ll use a Beta distribution for our prior, specifically \(\text{Beta}(4,4)\).
Combining using Bayes’ rule lets us calculate the maximum a posteriori (MAP) estimate:
θ_range = 0:0.01:1
plot(θ_range, flip_lposterior.(θ_range), color=:black, label="Posterior", linewidth=3, tickfontsize=16, legendfontsize=16, guidefontsize=18, bottom_margin=5mm, left_margin=5mm)
vline!([θ_map], color=:red, label="MAP", linewidth=2)
vline!([θ_mle], color=:blue, label="MLE", linewidth=2)
xlabel!(L"$\theta$")
ylabel!("Posterior Density")
plot!(size=(1000, 450))Frequentist: Parametric uncertainty is purely the result of sampling variability
Bayesian: Parameters have probabilities based on consistency with data and priors.
Think: how “likely” is a set of parameters to have produced the data given the specified data generating process?
# read in data and get annual maxima
function load_data(fname)
date_format = DateFormat("yyyy-mm-dd HH:MM:SS")
# This uses the DataFramesMeta.jl package, which makes it easy to string together commands to load and process data
df = @chain fname begin
CSV.read(DataFrame; header=false)
rename("Column1" => "year", "Column2" => "month", "Column3" => "day", "Column4" => "hour", "Column5" => "gauge")
# need to reformat the decimal date in the data file
@transform :datetime = DateTime.(:year, :month, :day, :hour)
# replace -99999 with missing
@transform :gauge = ifelse.(abs.(:gauge) .>= 9999, missing, :gauge)
select(:datetime, :gauge)
end
return df
end
dat = load_data("data/surge/h551.csv")
# detrend the data to remove the effects of sea-level rise and seasonal dynamics
ma_length = 366
ma_offset = Int(floor(ma_length/2))
moving_average(series,n) = [mean(@view series[i-n:i+n]) for i in n+1:length(series)-n]
dat_ma = DataFrame(datetime=dat.datetime[ma_offset+1:end-ma_offset], residual=dat.gauge[ma_offset+1:end-ma_offset] .- moving_average(dat.gauge, ma_offset))
# group data by year and compute the annual maxima
dat_ma = dropmissing(dat_ma) # drop missing data
dat_annmax = combine(dat_ma -> dat_ma[argmax(dat_ma.residual), :], groupby(transform(dat_ma, :datetime => x->year.(x)), :datetime_function))
delete!(dat_annmax, nrow(dat_annmax)) # delete 2023; haven't seen much of that year yet
rename!(dat_annmax, :datetime_function => :Year)
select!(dat_annmax, [:Year, :residual])
dat_annmax.residual = dat_annmax.residual / 1000 # convert to m
# make plots
p1 = plot(
dat_annmax.Year,
dat_annmax.residual;
xlabel="Year",
ylabel="Annual Max Tide Level (m)",
label=false,
marker=:circle,
markersize=5,
tickfontsize=16,
guidefontsize=18
)
p2 = histogram(
dat_annmax.residual,
normalize=:pdf,
orientation=:horizontal,
label=:false,
xlabel="PDF",
ylabel="",
yticks=[],
tickfontsize=16,
guidefontsize=18
)
l = @layout [a{0.7w} b{0.3w}]
plot(p1, p2; layout=l, link=:y, ylims=(1, 1.7), bottom_margin=5mm, left_margin=5mm)
plot!(size=(1000, 450))\[ \begin{align*} & y \sim LogNormal(\mu, \sigma) \tag{likelihood}\\ & \left. \begin{aligned} & \mu \sim Normal(0, 1) \\ & \sigma \sim HalfNormal(0, 5) \end{aligned} \right\} \tag{priors} \end{align*} \]
Want to find:
\[p(\mu, \sigma | y) \propto p(y | \mu, \sigma) p(\mu)p(\sigma)\]
Key idea: what do the priors imply for observable variables?
Let’s simulate data from the prior predictive distribution to see we get plausible outcomes.
\[y \sim p(\tilde{y}) = \int_{\Theta} p(\tilde{y} | \theta) p(\theta) d\theta\]
# sample from priors
μ_sample = rand(Normal(0, 1), 1_000)
σ_sample = rand(truncated(Normal(0, 5), 0, +Inf), 1_000)
# define return periods and cmopute return levels for parameters
return_periods = 2:100
return_levels = zeros(1_000, length(return_periods))
for i in 1:1_000
return_levels[i, :] = quantile.(LogNormal(μ_sample[i], σ_sample[i]), 1 .- (1 ./ return_periods))
end
plt_prior_1 = plot(; yscale=:log10, yticks=10 .^ collect(0:2:16), ylabel="Return Level (m)", xlabel="Return Period (yrs)",
tickfontsize=16, legendfontsize=18, guidefontsize=18, bottom_margin=10mm, left_margin=10mm, legend=:topleft)
for idx in 1:1_000
label = idx == 1 ? "Prior" : false
plot!(plt_prior_1, return_periods, return_levels[idx, :]; color=:black, alpha=0.1, label=label)
end
plt_prior_1\[ \begin{align*} & y \sim LogNormal(\mu, \sigma) \tag{likelihood}\\ & \left. \begin{aligned} & \mu \sim Normal(0, 0.5) \\ & \sigma \sim HalfNormal(0, 0.1) \end{aligned} \right\} \tag{priors} \end{align*} \]
# sample from priors
μ_sample = rand(Normal(0, 0.5), 1_000)
σ_sample = rand(truncated(Normal(0, 0.1), 0, +Inf), 1_000)
return_periods = 2:100
return_levels = zeros(1_000, length(return_periods))
for i in 1:1_000
return_levels[i, :] = quantile.(LogNormal(μ_sample[i], σ_sample[i]), 1 .- (1 ./ return_periods))
end
plt_prior_2 = plot(; ylabel="Return Level (m)", xlabel="Return Period (yrs)", tickfontsize=16, legendfontsize=18, guidefontsize=18, bottom_margin=10mm, left_margin=10mm)
for idx in 1:1_000
label = idx == 1 ? "Prior" : false
plot!(plt_prior_2, return_periods, return_levels[idx, :]; color=:black, alpha=0.1, label=label)
end
plt_prior_2ll(μ, σ) = sum(logpdf(LogNormal(μ, σ), dat_annmax.residual))
lprior1(μ, σ) = logpdf(Normal(0, 1), μ) + logpdf(truncated(Normal(0, 5), 0, Inf), σ)
lprior2(μ, σ) = logpdf(Normal(0, 0.5), μ) + logpdf(truncated(Normal(0, 0.1), 0, Inf), σ)
lposterior1(μ, σ) = ll(μ, σ) + lprior1(μ, σ)
lposterior2(μ, σ) = ll(μ, σ) + lprior2(μ, σ)
p_map1 = optimize(p -> -lposterior1(p[1], p[2]), [0.0, 0.0], [1.0, 1.0], [0.5, 0.5]).minimizer
p_map2 = optimize(p -> -lposterior2(p[1], p[2]), [0.0, 0.0], [1.0, 1.0], [0.5, 0.5]).minimizer
μ = 0.15:0.005:0.35
σ = 0.04:0.01:0.1
posterior1_vals = @. lposterior1(μ', σ)
posterior2_vals = @. lposterior2(μ', σ)
p_post1 = contour(μ, σ, posterior1_vals,
levels=100,
clabels=false,
cbar=false, lw=1,
fill=(true,cgrad(:grays,[0,0.1,1.0])),
title = "Diffuse Prior"
)
scatter!(p_post1, [p_map1[1]], [p_map1[2]], label="MLE", markersize=10, marker=:star)
xlabel!(p_post1, L"$\mu$")
ylabel!(p_post1, L"$\sigma$")
plot!(p_post1, size=(600, 500))
p_post2 = contour(μ, σ, posterior2_vals,
levels=100,
clabels=false,
cbar=false, lw=1,
fill=(true,cgrad(:grays,[0,0.1,1.0])),
title = "More Informed Priors"
)
scatter!(p_post2, [p_map2[1]], [p_map2[2]], label="MAP", markersize=10, marker=:star)
xlabel!(p_post2, L"$\mu$")
ylabel!(p_post2, L"$\sigma$")
plot!(p_post2, size=(600, 500))
display(p_post1)
display(p_post2)p_map1 = [0.25470601822948175, 0.05548961712923218]
p_map2 = [0.2546874075933288, 0.05542213686160764]
p1 = histogram(
dat_annmax.residual,
normalize=:pdf,
legend=:false,
ylabel="PDF",
xlabel="Annual Max Tide Level (m)",
tickfontsize=16,
guidefontsize=18,
bottom_margin=5mm, left_margin=5mm
)
plot!(p1, LogNormal(p_map2[1], p_map2[2]),
linewidth=3,
color=:red)
xlims!(p1, (1, 1.7))
plot!(p1, size=(600, 450))
return_periods = 2:500
return_levels = quantile.(LogNormal(p_map2[1], p_map2[2]), 1 .- (1 ./ return_periods))
# function to calculate exceedance probability and plot positions based on data quantile
function exceedance_plot_pos(y)
N = length(y)
ys = sort(y; rev=false) # sorted values of y
nxp = xp = [r / (N + 1) for r in 1:N] # exceedance probability
xp = 1 .- nxp
return xp, ys
end
xp, ys = exceedance_plot_pos(dat_annmax.residual)
p2 = plot(return_periods, return_levels, linewidth=3, color=:blue, label="Model Fit", tickfontsize=16, legendfontsize=18, guidefontsize=18, bottom_margin=5mm, left_margin=5mm, right_margin=10mm, legend=:bottomright)
scatter!(p2, 1 ./ xp, ys, label="Observations", color=:black, markersize=5)
xlabel!(p2, "Return Period (yrs)")
ylabel!(p2, "Return Level (m)")
xlims!(-1, 300)
plot!(p2, size=(600, 450))
display(p1)
display(p2)One of the points of Bayesian statistics is we get a distribution over parameters.
Sampling from this distribution is often more involved.
When the mathematical forms of the likelihood and the prior(s) are conjugate, the posterior is a nice closed-form distribution.
Examples:
Sampling using conjugate priors is called Gibbs sampling.
In general, priors matter more for:
Always justify and test your priors. Explicitly compare the prior to the posterior to see whether your inferences are driven by the prior or by the data (probability model).
Next Week: Sampling! Specifically, Monte Carlo.
Homework 2 due Friday (2/21).
Quiz: Due Monday (all on today’s lecture).
Project: Will discuss Monday, start thinking about possible topics.